Songs by genre

By Jongyoon Sohn, js5342@columbia.edu

People have been living in a world full of various sentiments such as excitements, sadness, happiness, etc. Songs have been used as representatives of emotions towards love, politics and even more. In this project, it will deliver some interesting findings about songs.


Word counts

First, let’s take a looki at the number of characters of stemmed words by genre. As you can see, the number of characters used in Rock is the biggest, even 2 times more than that of Hip-Hop, the next frequent genre. I looked at the dt_lyrics dataset more closely and found out that there are significant imbalances in observations, i.e. high number of observations in 2006 and 2007 compared to the rest years, so that it concludes that it makes sense to compare them in proportions rather than the counts. For the purpose, two genres, Rock and Indie, are chosen to see if there are some differences.


Which words are used the most?

These are wordclouds that show which words are frequently used, presented in different sizes. Two plots show that two genres have top frequently used words in common. For example, they all talk about youre, time, life a lot. Major differences in frequencies are hard to distinguish and common words are frequently used regardless of genres.


Emotions come from songs

We sing because we want to express our feelings through songs so the sentiment analysis is the most important part in this project. Let’s look at some of characteristics of stemmed words by the sentiment analysis referenced by the NRC lexicon. In each year, the frequencies of eight basic emotions were collected in percent and calculated to show the emotion distributions.

Emotions distribution

All eight emotions are expressed at word counts rates. At the first plot, joy, sadness, trust have the highest frequencies in use than others. In addition, the right plot shows the density distribution of skewness, which tells us that emotions such as surprise, disgust, fear are approximately normally distributed but those such as joy, sadness, anger are highly skewed.

What about Hip-Hop, Indie, Rock?

This graph shows how different the emotions would be distributed depending on different genres. So ridgeline plots are made to represent the distribution of emotions used in Hip-Hop, Indie, and Rock. Obviously, the emotions used in each of three genres are distributed very differently.


Does time affect the sentiments?

So I got curious if the sentiments of songs are distributed equally through the time spans. This plot represents the time series of each emotion by year. Overall, joy and sadness are most used sentiments in songs regardless of genres, but the frequencies are starting to change constantly as time goes by.

Interestingly, not all genres have the similar emotions distribution. Major differences in sentiments are found in Rock and Indie. Those two plots above show that the sentiment plot in Rock seems to follow the distribution of the overall. However, the sentiment plot in Indie is telling a completely different series. Therefore, emotions used in different genres are distributed significantly different by time.


Conclusions